Home Mortgage Disclosure Act data provides insight into which loans are approved and denied and includes details on loan characteristics such as use, amount, outcome, and on applicant characteristics such as race, income, and census tract. These graphs attempt to understand why some people receive loans while others do not, where most loans are approved and denied, and more.
#approval rates by census tract
#number of loans apps by census tract
apps_by_tract <- hmda_ny_5yrs %>%
group_by(census_tract_number) %>%
summarise(total_apps = n())
#approvals by tract
approval_counts_by_tract <- hmda_ny_5yrs %>%
filter(outcome == "Approved") %>%
group_by(census_tract_number) %>%
summarise(total_approved = n())
approval_rates_by_tract <-
approval_counts_by_tract %>%
left_join(apps_by_tract, by = "census_tract_number") %>%
mutate(approval_rate = total_approved/total_apps)
census_tract_merged <- left_join(census_tracts, approval_rates_by_tract,
by = c("ct2010" ="census_tract_number")) %>%
st_as_sf()
#dealing with ggplot
census_tract_merged$breaks <- cut(census_tract_merged$approval_rate, c(0, .2, .4, .6, .8, 1))
#plot- approval rates by census block ####
ggplot() +
geom_sf(data = census_tract_merged,
aes(fill = breaks), color = "white", lwd = 0) +
labs(title = "Most census tracts have approval rates between \n60% and 80%", subtitle= "The highest approval rates are found \nin Queens, while rates dip below 40% in historically \ndisdvantaged parts of the city, such as East Brooklyn",
caption = "Home Mortgage Disclosure Act Data, CFPB, 2013-2017") +
theme(axis.text=element_blank(),
axis.title=element_blank(),
panel.background = element_rect(fill = "#ffffff", color = NA),
axis.line = element_blank(),
legend.position = c(0.25,.75),
text = element_text(family = "Meiryo"),
axis.ticks = element_blank()) + guides(fill=guide_legend(title="Approval Rate")) + scale_fill_manual(values = c('#ffffd4', '#fdd2c0', '#f3a683', '#e27a48', '#cc4c02'))
The map above and the next map create an understanding of loan approval rates across census tracts. While there are clearly differences in loan approval rates across the city, there is not a clear distinction between certain areas with higher approval rates, for example, there are high approval tracts in Brooklyn and the Bronx that are not know for being high income (as one might expect). In terms of low approval rates, this graph shows us that parts of East Brookyn, like Brownsville have lower rates than average.
#map of loan approval rates for low income applicants
census_tracts_bk <- census_tracts %>% filter(boro_name == "Brooklyn")
hmda_ny_5yrs <- hmda_ny_5yrs %>% mutate(income_bins = ifelse(applicant_income_000s %in% c(1:73), "Income Quartile 1",
ifelse(applicant_income_000s %in% c(74:108), "Income Quartile 2",
ifelse(applicant_income_000s %in% c(109:180), "Income Quartile 3",
ifelse(applicant_income_000s %in% c(181:133549), "Income Quartile 4", NA)))))
apps_outcome_Q1_bk <- hmda_ny_5yrs %>% filter(income_bins == "Income Quartile 1") %>% group_by(census_tract_number, as_of_year, outcome) %>% summarise(outcome_tot = n())
apps_total_Q1_bk <- hmda_ny_5yrs %>% filter(income_bins == "Income Quartile 1") %>% group_by(census_tract_number, as_of_year) %>% summarise(tot_apps = n())
approval_rate_5yr_changes <- apps_outcome_Q1_bk %>% left_join(apps_total_Q1_bk, by = c("census_tract_number", "as_of_year")) %>% filter(outcome == "Approved", as_of_year %in% c(2013, 2017)) %>% mutate(approval_rate = outcome_tot/tot_apps)
approval_rate_5yr_changes$prior <- lag(approval_rate_5yr_changes$approval_rate)
census_tract_merged <- approval_rate_5yr_changes %>% filter(as_of_year == 2017) %>% mutate(approval_rate_change = approval_rate - prior) %>% right_join(census_tracts_bk, by = c("census_tract_number" = "ct2010"))
census_tract_merged$breaks <- cut(census_tract_merged$approval_rate_change, c(-0.8, -0.3, 0, 0.3, 0.8))
ggplot() + geom_sf(data = census_tract_merged,
aes(fill = breaks), color = "white", lwd = 0) + guides(fill=guide_legend(title="Change in Approval Rate \nbetween 2013-2017")) + scale_fill_manual(values = c('#cc4c02', '#e27a48', "#a6d96a", "#1a9641")) + theme(axis.text=element_blank(),
axis.title=element_blank(),
panel.background = element_rect(fill = "#ffffff", color = NA),
axis.line = element_blank(),
legend.position = c(.2, .8), legend.background = element_blank(),
text = element_text(family = "Meiryo"),
axis.ticks = element_blank()) + labs(title = "Approval rates for low-income applicants in Brooklyn \nhave decreased significantly in gentrifying areas \nsuch as Bed-Stuy, East Williamsburg, and Crown Heights", subtitle = "However, many areas slightly further from Manhattan \nshow increased approval rates", caption = "Home Mortgage Disclosure Act Data, CFPB, 2013-2017")
When looking at approval rates over time for low-income applicants, we can see a decrease in approval rates in areas like Bed-Stuy and Crown Heights, which are gentrifying quickly. This might suggest that the properties are not affordable to low-income homeowners.
#approval rates for low income applicants, by NTA faceted by year
hmda_ny_5yrs <- hmda_ny_5yrs %>% mutate(income_bins = ifelse(applicant_income_000s %in% c(1:73), "Income Quartile 1",
ifelse(applicant_income_000s %in% c(74:108), "Income Quartile 2",
ifelse(applicant_income_000s %in% c(109:180), "Income Quartile 3",
ifelse(applicant_income_000s %in% c(181:133549), "Income Quartile 4", NA)))))
apps_outcome_Q1_nta <- hmda_ny_5yrs %>% filter(income_bins == "Income Quartile 1") %>% group_by(ntacode, as_of_year, outcome) %>% summarise(outcome_tot = n())
apps_total_Q1_nta <- hmda_ny_5yrs %>% filter(income_bins == "Income Quartile 1") %>% group_by(ntacode, as_of_year) %>% summarise(tot_apps = n())
approval_rate_5yr <- apps_outcome_Q1_nta %>% left_join(apps_total_Q1_nta, by = c("ntacode", "as_of_year")) %>% filter(outcome == "Approved") %>% mutate(approval_rate = outcome_tot/tot_apps)
approval_rate_5yr$breaks <- cut(approval_rate_5yr$approval_rate, c(0, 0.2, 0.4, 0.6, 0.8, 1))
#need to merge back with census hmda and then merge that with census tracts
approval_rate_5yr %>% left_join(nta_shape, by = "ntacode") %>% ggplot() + geom_sf(aes(fill = breaks), lwd = 0) + facet_wrap(~as_of_year) + scale_fill_manual(values = c('#ffffd4', '#fdd2c0', '#f3a683', '#e27a48', '#cc4c02')) + guides(fill=guide_legend(title="Approval Rate")) + theme(axis.text=element_blank(),
axis.title=element_blank(),
panel.background = element_rect(fill = "#ffffff", color = NA),
axis.line = element_blank(),
legend.position = c(.8, .3), legend.background = element_blank(),
text = element_text(family = "Meiryo"),
axis.ticks = element_blank()) + labs(title ="Approval rates for low income applicants decline in Manhattan, \nthe Bronx, and Brooklyn between 2013 and 2017", subtitle = "More NTAs in Queens experience an increase in approval rates", caption = "Home Mortgage Disclosure Act Data, CFPB, 2013-2017")
This map showcases the way approval rates have changed each year for low-income applicants, with darker areas representing higher approval rates. It is clear that as time passes, the overall picture of the city becomes lighter, and the higher approval rate areas become concentrated in Queens.
#map with income to loan amount ratios by census tract
#general rule - can afford to finance a mortgage between 2-2.5x gross yearly income
hmda_ny_5yrs <- hmda_ny_5yrs %>% mutate(income_as_part_of_loan = applicant_income_000s/loan_amount_000s)
hmda_ny_5yrs <- hmda_ny_5yrs %>%
mutate(win_loan_amount = winsorize(loan_amount_000s, probs = c(.01, .99)))
hmda_ny_5yrs <- hmda_ny_5yrs %>%
mutate(win_income = winsorize(applicant_income_000s, probs = c(.01, .99)))
hmda_ny_5yrs <- hmda_ny_5yrs %>% mutate(win_loan_income_ratio = win_loan_amount/win_income)
#need only people who applied without a co-applicant/ look at data dictionary
test <- hmda_ny_5yrs %>% filter(as_of_year == 2017, co_applicant_ethnicity_name == "No co-applicant") %>%
group_by(census_tract_number) %>%
summarise(mean_loan_income_ratio =
mean(win_loan_income_ratio, na.rm = TRUE)) %>%
left_join(census_tracts, by = c("census_tract_number" = "ct2010"))
test$breaks <- cut(test$mean_loan_income_ratio, c(0, 2, 3, 4, 240))
ggplot(test) +
geom_sf(aes(fill = breaks), color = "white", lwd = 0) + scale_fill_manual(values = c("#1a9641", "#a6d96a", '#e27a48', '#cc4c02')) + theme(axis.text=element_blank(),
axis.title=element_blank(),
panel.background = element_rect(fill = "#ffffff", color = NA),
axis.line = element_blank(),
legend.position = c(.3, .7),
legend.background = element_blank(),
text = element_text(family = "Meiryo"),
axis.ticks = element_blank()) + labs(title = "Most census tracts have an average loan-to-income ratio over 3.", subtitle = "The general rule is that one can finance a mortgage between 2-2.5 times \ntheir annual income. This means that many applicants are \nattempting to borrow outside of their means.", caption = "Home Mortgage Disclosure Act Data, CFPB, 2013-2017", fill = "Loan-to-income ratio")
# This map looks at approval rates for Black and American American loan applicants by census tracts. When compared to the previous graph, it is clear the Black approval rates are much lower than the average approval rates in most of the census tracts. Many of the tracts that had rates between 50-75 percent, are now in the 25-50 percent range. This indicates that Black applicants receive below average loan approval rates in many neighborhoods. Understanding the cause of this outcome would be helpful in improving socioceonomic equality, as homeownership is an important step towards gaining wealth.
Because the HMDA data does not allow for a calculation of payment to income ratio, which is regularly used to measure the ability of an applicant to pay off the loan on their current income, I calculated a loan-to-income ratio. According to investopedia.com an applicant can afford a loan around 2-2.5x the amount of the applicant’s income. I used this information to look at what parts of the city have applicants within and below this metric.
apps_per_year_outcome_race <- hmda_ny_5yrs %>%
group_by(as_of_year, race_alternative, outcome) %>%
summarise(apps_per_year_outcome_race = n())
apps_per_year_outcome <- hmda_ny_5yrs %>%
group_by(as_of_year, race_alternative) %>%
summarise(apps_per_year_outcome = n())
#approval rates over the five years ####
approval_rate_year_race <- apps_per_year_outcome_race %>%
left_join(apps_per_year_outcome,
by = c("as_of_year", "race_alternative")) %>%
mutate(outcome_rate = apps_per_year_outcome_race/apps_per_year_outcome) %>%
filter(outcome == "Approved")
ggplot(approval_rate_year_race,
aes(x = as_of_year, y = outcome_rate, color = race_alternative)) +
geom_point() + geom_line() +
scale_x_continuous(breaks = c(2013, 2014, 2015, 2016, 2017)) +
labs(title = "Loan approval rates for Black applicants \n are consistently below those of Whites and Asians applicants", x = "Year", y = "Approval Rate", caption = "Home Mortgage Disclosure Act Data, CFPB, 2013-2017") +
geom_text_repel(data = filter(approval_rate_year_race, as_of_year == 2017),
aes(x = 2017, label=race_alternative)) +
theme(legend.position = "none", text = element_text(family = "Meiryo"), panel.background = element_rect(fill = '#F2F5F2'), panel.grid.minor = element_line(linetype = "dashed"), axis.ticks = element_blank()) + scale_color_manual(values = c('#E8291A','#FD7400', '#004358', '#1f8a70', '#BEDB39'))
This graph shows that Asians and Whites have consistently had the highest approval rates. The unknown category is made up of people who did not choose the dislose their race in their mortgage application, however, given that they have the highest approval rates, there is a large chance that this group is made up mostly of White applicants. Most races have had consistent approval rates over the last five years, however, American Indians and Native Alaskans saw a sharp decrease in their loan approval rates in 2016. This graph displays important differences in racial approval rates and inspires further research on whether these differences are due to historical differences in wealth or racial discrimination in lending.
#denial reasons over time
denial_reasons_year <- hmda_ny_5yrs %>% filter(outcome == "Denied", !is.na(denial_reason_name_1)) %>%
group_by(as_of_year, denial_reason_name_1) %>% summarise(count_per_reason = n())
denial_count_year <- hmda_ny_5yrs %>% filter(outcome == "Denied", !is.na(denial_reason_name_1)) %>%
group_by(as_of_year) %>% summarise(tot_denied = n())
#denial reasons changing over time ####
denial_reasons_year %>%
left_join(denial_count_year, by = "as_of_year") %>%
mutate(denial_rate = count_per_reason/tot_denied) %>%
ggplot(aes(x = as_of_year, y = denial_rate, color = denial_reason_name_1)) +
geom_point() + geom_line() + labs(title = "Debt-to-Income Ratio is consistently the most \nfrequent reason for denial",
subtitle = "Insufficient Collateral and Incomplete Applications have been decreasing as \ndenial reasons while Credit History-related denials have been increasing",
caption = "Home Mortgage Disclosure Act Data, CFPB, 2013-2017",
x = "Year", y = "Denial Rate", color = "Denial Reason") + scale_color_manual(values = c('#E8291A','#FFE11A', '#004358', '#1f8a70', '#BEDB39', '#FD7400', '#BF4182', '#580F45', '#0D177F')) + theme(text = element_text(family = "Meiryo"), panel.background = element_rect(fill = '#F2F5F2'), axis.ticks = element_blank())
The following graph explores whether and how denial reasons have changed over time. While the differences in these rates of denial are minor, the highest frequcny rate suggests that many people are applying for mortgages that the banks believe they are unable to afford on their current incomes. Is this because the property values are high and people want to remain in a certain area or is this because people are simply attracted to things outside of their price ranges? These are some of the research questions raised by the graph.
num_denied_by_race <- hmda_ny_5yrs %>%
filter(outcome == "Denied", !is.na(denial_reason_name_1)) %>%
group_by(applicant_race_name_1) %>%
summarise(tot_apps_denied = n())
denial_reasons_race <- hmda_ny_5yrs %>%
filter(outcome == "Denied", !is.na(denial_reason_name_1)) %>%
group_by(applicant_race_name_1, denial_reason_name_1) %>%
summarise(denied_by_reason = n())
#denial reasons rates race ####
denial_reasons_race %>%
left_join(num_denied_by_race, by = "applicant_race_name_1") %>%
mutate(denial_rate = denied_by_reason/tot_apps_denied) %>%
ggplot() +
geom_bar(aes(x = applicant_race_name_1,
y = denial_rate,
fill = denial_reason_name_1),
stat = "identity") +
theme(axis.text.x = element_text(angle = 45, hjust = 1), text = element_text(family = "Meiryo"), panel.background = element_rect(fill = '#F2F5F2'), axis.ticks = element_blank()) +
labs(title = "Credit History and Debt-to-Income Ratio are the two \nmost frequent loan denial reasons",
subtitle = "However, the distribution of reasons differ significantly by race",
caption = "Home Mortgage Disclosure Act Data, CFPB, 2013-2017",
x = "Applicant Race", y = "Denial Rate", fill = "Denial Reason") + scale_fill_discrete(name = "Denial Reason") + scale_fill_manual(values = c('#E8291A','#FFE11A', '#004358', '#1f8a70', '#BEDB39', '#FD7400', '#BF4182', '#580F45', '#0D177F'))
Having seen the overall denial rates, the following graph attempts to visualize how the racial breakdown of denial reasons differs from the averages. Asians and Whites are very rarely denied because of their credit histories while Blacks, Native Americans, and Pacific Islanders are most frequently denied because of their credit histories. This begs a better understanding of why certain racial groups would have worse credit histories, whether this measure if being used to deny people because it is more subjective than other measures, or what can be done to improve the credit histories of historically disadvantaged groups.
#looking at differences in approval rates by male/female
hmda_ny_5yrs <- hmda_ny_5yrs %>%
mutate(win_income = winsorize(applicant_income_000s, probs = c(.01, .99)))
ggplot(filter(hmda_ny_5yrs, applicant_sex_name %in% c("Female","Male")),
aes(y=win_income, x=applicant_sex_name, fill=outcome), alpha = .3) +
geom_boxplot() + scale_y_log10(expand=c(0,0)) +
scale_fill_manual(values = c('#1f8a70', '#E8291A', '#FFE11A')) +
theme(text = element_text(family = "Meiryo"),
panel.background = element_rect(fill = '#F2F5F2'), panel.grid.major = element_line(color = "#404040"), panel.grid.minor = element_blank()) +
labs(fill = "Outcome", x = "Gender of Applicant", y = "Income (in $1,000s)",
title = "On average, male applicants have higher incomes than women",
subtitle = "While approval relies on other factors, the distribution \nof incomes within loan application outcomes differs between genders \nand reflects systemic inequality in approvals or loans, incomes, or both.", caption = "Home Mortgage Disclosure Act Data, CFPB, 2013-2017")
In exploring whether loan approvals processes systemically disadvantage any groups, I have chosen to explore gender as well as race. This graph shows the differences in average income for males and females within each outcome group (approved, denied, or other). Men who are approved for loans typically have higher incomes than women who are approved for loans. Men who are denied also have higher incomes, on average than women who are denied a loan. The other category contains people who did not provide information needed to complete their loan applications, and also reflects the societal differences in incomes by gender.
#loan application outcome rates by race and year (facet_grid)
hmda_ny_5yrs <- hmda_ny_5yrs %>% mutate(income_bins = ifelse(applicant_income_000s %in% c(1:73), "Income Quartile 1",
ifelse(applicant_income_000s %in% c(74:108), "Income Quartile 2",
ifelse(applicant_income_000s %in% c(109:180), "Income Quartile 3",
ifelse(applicant_income_000s %in% c(181:133549), "Income Quartile 4", NA)))))
#total loans per quartile, year, race
quartile_race_year <- hmda_ny_5yrs %>%
group_by(income_bins, race_alternative, as_of_year, county_name) %>%
summarise(total = n())
quartile_race_year$race_alternative[which(quartile_race_year$race_alternative == "Black or African American")] <- "Black"
quartile_race_year_outcome <- hmda_ny_5yrs %>%
group_by(income_bins, race_alternative, outcome, as_of_year, county_name) %>%
summarise(per_outcome = n())
quartile_race_year_outcome$race_alternative[which(quartile_race_year_outcome$race_alternative == "Black or African American")] <- "Black"
quartile_race_year_outcome %>%
left_join(quartile_race_year,
by = c("income_bins", "race_alternative", "as_of_year", "county_name")) %>%
mutate(outcome_rate = per_outcome/total) %>%
filter(race_alternative %in% c("Asian", "White", "Black", "Other"), county_name == "Kings County", !is.na(income_bins)) %>%
ggplot() + geom_line(aes(x = as_of_year, y = outcome_rate, color = outcome)) +
facet_wrap(race_alternative~income_bins) +
scale_color_manual(values = c('#1f8a70', '#E8291A', '#FFE11A')) +
theme(text = element_text(family = "Meiryo"),
panel.background = element_rect(fill = '#F2F5F2'), panel.spacing = unit(1.5, "lines"), legend.position = "bottom", axis.ticks = element_blank()) +
labs(title = "Loan application approval rates for low-to-middle income Blacks in Brooklyn has been decreasing, \nwhile most other races and quartiles have steady approval rates", subtitle = "Rates over time broken down by income quartiles and by race, show that for \nWhites and Asians, approvals have been steadily increasing, \nwhile for Blacks and Others in the first quartile, rates are decreasing ", caption = "Home Mortgage Disclosure Act Data, CFPB, 2013-2017", color = "Outcome", x = "Year", y = "Rate")
By breaking down applicants into income quartiles, and by race, we can look at differences in approvial rates overtime, and identify the groups that are struggling with the changing NYC housing landscape. Approval rates for Black applicants in the first income quartile have been steadily decreasing over the last five years. Many other income groups and racial groups have not seen any decreases in their approval rates over time. This graph also allows one to contrast how high White approval rates are compared to Black approval rates in the first income quartile. The Other category is a combination of American Indians, Native Alaskans, Native Hawaiians, and Pacific Islanders. The Other category has very different approval rates over time for each income quartile, and for the first income quartile, approval rates were below denial rates for most years.